R-Tree

What is an R-Tree?

R-Trees are data structures used to efficiently store and search for rectangles or spatial objects (like bounding boxes). Each node in the tree stores a bounding box that covers its children. Leaf nodes contain the actual objects (e.g., points, rectangles), and non-leaf nodes contain bounding boxes that group those objects together. When inserting a new object, the tree finds the node whose box needs the least enlargement to include the new one. If a node gets too full, it splits. Where It's Used Geographic Information Systems (GIS) (e.g., finding map features in a region) Game engines (to detect objects in view or near each other) Databases (spatial indexing of coordinates) Computer graphics (fast collision and intersection detection) Strengths Very efficient for range queries (e.g., "find all objects in this area") Works well for rectangular or spatial data Can handle overlapping or irregular shapes Supports dynamic insertions and deletions Weaknesses Overlapping bounding boxes in nodes can reduce performance Balancing the tree isn't always perfect (it’s not strictly height-balanced like AVL trees) Can be less efficient than KD-Trees for exact nearest-neighbor searches

R-Tree Operations

Construction:

Start with empty tree
Insert entries one by one using insertion algorithm

Insertion:

Choose Leaf: Traverse to node needing least bounding box expansion
Insert Entry: Add object and update bounding boxes
Split If Needed: On overflow:
- Apply quadratic or linear split
- Propagate adjustments upward

Search:

Range Query: Visit nodes overlapping the query region
Nearest Neighbor: Use best-first traversal with distance pruning

Deletion:

Find and remove target entry
Condense tree and reinsert orphaned nodes

Complexities:

Operation	Average	Worst
Bulk Loading	O(n log n)	O(n²)
Insertion	O(log n)	O(n)
Deletion	O(log n)	O(n)
Range Query	O(logn + m)	O(n)
Nearest Neighbor	O(log n)	O(n)
k-NN Query	O(k log n)	O(kn)

Bulk Loading:

R-Trees can be built efficiently using bulk-loading algorithms like Sort-Tile-Recursive (STR), which sort and group spatial objects to minimize overlap. This process reduces tree height and clustering issues, and typically takes O(n log n) time due to sorting and hierarchical insertion.

Insertion/Deletion:

Standard insertions and deletions both traverse from the root to a leaf, making decisions at each level. These operations take O(log n) on average since the tree remains balanced and shallow due to grouping of multiple entries per node.

k-NN Search:

To find the k closest points, R-Trees use a priority queue to explore nodes ordered by distance. This allows pruning of far regions early, leading to O(k log n) on average. As with NN, poor clustering or overlapping rectangles can worsen performance.

Range Query:

Range queries recursively check which bounding rectangles intersect the query window. The tree’s height is log n, and only overlapping branches are explored. With m results returned, complexity is O(log n + m) in most cases.

R-Tree: